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Abstract —The paper has two parts. The first one deals with 
how to use large random matrices as building blocks to model 
the massive data arising from the massive (or large-scale) MIMO 
system. As a result, we apply this model for distributed spectrum 
sensing and network monitoring. The part boils down to the 
streaming, distributed massive data, for which a new algorithm 
is obtained and its performance is derived using the central limit 
theorem that is recently obtained in the literature. The second 
part deals with the large-scale testbed using software-defined 
radios (particularly USRP) that takes us more than four years to 
develop this 70-node network testbed. To demonstrate the power 
of the software defined radio, we reconfigure our testbed quickly 
into a testbed for massive MIMO. The massive data of this testbed 
is of central interest in this paper. It is for the first time for us 
to model the experimental data arising from this testbed. To our 
best knowledge, we are not aware of other similar work. 

Index Terms —Massive MIMO, 5G Network, Random Matrix, 
Testbed, Big Data. 

I. Introduction 

Massive or large-scale multiple-input, multiple output 
(MIMO), one of the disruptive technologies of the next gener¬ 
ation (5G) communications system, promises significant gains 
in wireless data rates and link reliability 0 © In this paper, 
we deal with the massive data aspects of the massive MIMO 
system. In this paper, we use two terms (massive data and big 
data) interchangeably, following the practice from National 
Research Council (3). 

The benefits from massive MIMO are not only limited to 
the higher data rates. Massive MIMO techniques makes green 
communications possible. By using large numbers of antennas 
at the base station, massive MIMO helps to focus the radi¬ 
ated energy toward the intended direction while minimizing 
the intra and intercell interference. The energy efficiency is 
increased dramatically as the energy can be focused with 
extreme sharpness into small regions in space ©. It is shown 
in Q that, when the number of base station (BS) antennas M 
grows without bound, we can reduce the transmitted power of 
each user proportionally to 1/M if the BS has perfect channel 
state information (CSI), and proportionally to if CSI is 
estimated from uplink pilots. Reducing the transmit power of 
the mobile users can drain the batteries slower. Reducing the 
RF power of downlink can cut the electricity consumption of 
the base station. 

Massive MIMO also brings benefits including inexpensive 
low-power components, reduced latency, simplification of 
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MAC layer, etc |4j. Simpler network design could bring lower 
complexity computing which save more energy of the network 
to make the communications green. 

Currently, most of the research of massive MIMO is focused 
on the communications capabilities. In this paper, we promote 
an insight that, very naturally , the massive MIMO system can 
be regarded as a big data system. Massive waveform data— 
coming in a streaming manner—can be stored and processed 
at the base station with a large number of antennas, while 
not impacting the communication capability. Especially, the 
random matrix theory can be well mapped to the architecture 
of large array of antennas. The random matrix theory data 
model has ever been validated by Jbj] in a context of distributed 
sensing. In this paper, we extend this work to the massive 
MIMO testbed. In particular, we studied the function of mul¬ 
tiple non-Hermitian random matrices and applied the variants 
to the experimental data collected on the massive MIMO 
testbed. The product of non-Hermitian random matrices shows 
encouraging potential in signal detection, that is motivated for 
spectrum sensing and network monitoring. We also present 
two concrete applications that are demonstrated on our testbed 
using the massive MIMO system as big data system. From the 
two applications, we foresee that, besides signal detection, the 
random-matrix based big data analysis will drive more mobile 
applications in the next generation wireless network. 

II. Modeling for Massive Data 

Large random matrices are used models for the massive data 
arising from the monitoring of the massive MIMO system. We 
give some tutorial remarks, to facilitate the understanding of 
the experimental results. 

A. Data Modeling with Large Random Matrices 

Naturally, we assume n observations of p-dimensional ran¬ 
dom vectors xi,...,x n G C pxl . We form the data matrix 
X = (xi,..., x n ) G C pxn , which naturally, is a random matrix 
due to the presence of ubiquitous noise. In our context, we are 
interested in the practical regime p = 100 — 1,000, while n 
is assumed to be arbitrary. The possibility of arbitrary sample 
size n makes the classical statistical tools infeasible. We are 
asked to consider the asymptotic regime 0-(To| 

p -A oo, n -A oc, p/n -A c G (0, oo), (1) 

while the classical regime GU considers 

p fixed, n -A oc,p/n -A 0. 


( 2 ) 
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Our goal is to reduce massive data to a few statistical pa¬ 
rameters. The first step often involves the covariance matrix 
estimation using the sample covariance estimator 

S = -XX fl = 
n 

that is a sum of rank-one random matrices 0 The sample 
covariance matrix estimator is the maximum likelihood estima¬ 
tor (so it is optimal) for the classical regime ([2]). However, for 
the asymptotic regime Q. this estimator is far from optimal. 
We still use this estimator due to its special structure. See (7]]- 
JlQ) for modern alternatives to this fundamental algorithm. For 
brevity, we use the sample covariance estimator throughout 
this paper. 

B. Non-Hermitian Free Probability Theory 

Once data are modeled as large random matrices, it is 
natural for us to introduce the non-Hermitian random matrix 
theory into our problem at hand. Qiu’s book (9) gives an 
exhaustive account of this subject in an independent chapter, 
from a mathematical view. This paper is complementary to 
our book 0 in that we bridge the gap between theory 
and experiments. We want to understand how accurate this 
theoretical model becomes for the real-life data. See Section Ivl 
for details. 

Roughly speaking, large random matrices can be treated as 
free matrix-valued random variables. “Free” random variables 
can be understood as independent random variables. The 
matrix size must be so large that the asymptotic theoretical 
results are valid. It is of central interest to understand this 
finite-size scaling in this paper. 

III. Distributed Spectrum Sensing 

Now we are convinced that large random matrices are valid 
for experimental data modeling. The next natural question 
is to test whether the signal or the noise is present in the 
data. Both networking monitoring and spectrum sensing can be 
formulated as a matrix hypothesis testing problem for anomaly 
detection. 

A. Related Work 

Specifically, consider the n samples yi, ...,y n , drawn from 
a p-dimensional complex Gaussian distribution with covari¬ 
ance matrix Y. We aim to test the hypothesis: 

Ho : 5] = I P . 

This test has been studied extensively in classical settings (i.e., 
p fixed, n —» oc), first in detail in fl3|. Denoting the sample 

covariance by S n = ^ y^yf 3 ”, the LRT is based on the 

i=1 

linear statistic (see Anderson (2003) [11, Chapter 10]) 

L = Tt (S n ) — In (det S n ) — p. (4) 

Under Ho, with p fixed, as n —y oc, nL is well known 
to follow a x 2 distribution. However, with high-dimensional 
data for which the dimension p is large and comparable to 


the sample size n, the y 2 approximation is no longer valid. 
A correction to the LRT is done in Bai, Jiang, Yao and 
Zheng (2009) JT4[ on large-dimensional covariance matrix by 
random matrix theory. In this case, a better approach is to use 
results based on the double-asymptotic given by Assumption 
1. Such a study has been done first under Ho and later under 
the spike alternative Hi. More specifically, under Ho, this 
was presented in 0 using a CLT framework established 
in Bai and Silverstein (2004) [13). Under “Hi : YI has a 
spiked covariance structure as in Model A”, this problem was 
addressed only very recently in the independent works, fl6| 
and 0. We point out that fl6| (see also p8| ) considered 
a generalized problem which allowed for multiple spiked 
eigenvalues. The result in (16| was again based on the CLT 
framework of Bai and Silverstein (2004) 0 with their 
derivation requiring the calculation of contour integrals. The 
same result was presented in 0 in this case making use of 
sophisticated tools of contiguity and Le Cam’s first and third 
lemmas 0 


(5) 


B. Spiked Central Wishart Matrix 
Our problem is formulated as 
H 0 : YI = I p 

Hi : S G Model A: Spiked central Wishart. 

Model A: Spiked central Wishart: Matrices with distribution 
CWp (n, YI, Opxp) ('n > p) , where YI has multiple distinct 
“spike” eigenvalues 1 + Si > • • • > 1 + S r , with S r > 0 
for all 1 < k < r, and all other eigenvalues equal to 1. 
Assumption 1. n,p —>> oo such that n/p —)> c ^ 1. 


Theorem III.l (Passemier, McKay and Chen (2014) p0|). 
Consider Model A and define 

a = (l — \/c) 2 , 6=(l + v / c) 2 - (6) 

Under Assumption 1, for an analytic function f : U C 
where U is an open subset of the complex plane which contains 
[a, b], we have 






•V ( 


^=i 


where 


p = 


£jf'« 


Fp - x) {x - a) 


dx 


(7) 


<r 2 = 


hi 


f O) 


271-2 Ja \J{b - x) {x - a) 


1 ' 


f ( y) AUXMiF a) 


x-y 


dy 


( 8 ) 

with these terms independent of the spikes. The spike- 
dependent terms ft (zqj) , 1 ^ i ^ r admit 


dx 


1 f b 


f( x ) 


(x — a) 


\/(zo,e ~ a) (zo,e ~ b) 


- 1 


Zox - X 


( 9 ) 


dx 


where 


( (l+c5e)(l+5e) 

l £ ^ 

Z 0,i = \ (1 +Vt ){l+V t ) 


for Model A 
for Model B 




(3) 
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The branch of the square root yj (zqj — a) (zq : £ — b) is cho¬ 
sen. 


As an application of Theorem |III1| for Model A, we con¬ 
sider the classical LRT that the population covariance matrix 
is the identity, under a rank-one spiked population alternative. 

Here, we will adopt our general framework to recover the 
same result as fl 6 j and G3 very efficiently, simply by calcu¬ 
lating a few integrals. Under Hi, as before we denote by 1 + 5 
the spiked eigenvalue of X. Since nS n ^ CW P (n, X, 0 pX p), 
we now apply Theorem III. 1 for the case of Model A to the 
function 

fL (x) = - - In (-) - 1 . 
c \cJ 

Let A^,l < i ^ p, be the eigenvalues of nS n . Since the 
domain of definition of /z, is ( 0 , oc), we assume that c > 1 
to ensure a > 0 (see 0). Then, under Assumption 1, 


L = 




l = Y1J l Av ( p p + p (*o,i) > °' 2 ) > 

where r = 1 is used for one spike to obtain 
fi = 1 + (c — 1) In (l — c -1 ) , a 2 = —c _1 In (l — c -1 ) 


Since Li,...,Ljy are Gaussian random variables, the sum of 
Gaussian random variables are also Gaussian; thus Lp = Li + 
• • • + Lat is also Gaussian, denoted as J\f (/i£>, cr 2 D ) . 

The false alarm probability for the linear statistic can be 
obtained using standard procedures. If L& > 7 , the signal is 
present; otherwise, the signal does not exist. The false alarm 
probability is 


P fa =F(L>'y\Ho) =P(^>™| no) 

OO 

= I ^ ex P (~t 2 / 2 ) dt 

L ~VP 

° =q{^) 

where Q ( x ) = exp (—t 2 / 2 ) dt. For a desired false- 

alarm rate 6 , the associated threshold should be chosen such 
that 

7 = mzH-Q -1 0) • 

& D 


To predict the detection probability, we need to know the 
distribution of £ under Hi, which has been obtained using 
Theorem |III.1| The detection probability is calculated as 


= F(L D > 7 \H 1 ) =p( 


Gmd > 7 -UP 

&P &p 

Lp — ^p 
crp 


= Q{fr* B )- 


\Hi ) 


with the spike-dependent term 


IV. Massive MIMO Testbed and Data Acquisition 


M = Si - In (1 + <5i). 

The special case of one spike is also considered in ED- These 
results are in agreement with (T 6 | and ED- 


C. Distributed Streaming Data 

For each server, equation formulates the testing problem. 
How do we formulate this problem when the data are spatially 
distributed across N servers? Our proposed algorithm is as 
follows: Algorithm 1 

1) The i- th server computes the sample covariance matrix 

Si, i = 1, ...,N. 

2) The i-th server computes the linear statistic 

U = Tr (Si) - In (det S*) - p, i = 1,..., N. 

3) The i-th server communicates the linear statistic Z^, i = 
1,..., 2V to one server that acts as the coordinator. 

4) Finally, the coordinator server obtains the linear statistic 
Li , i = 1, *,., N via communication and sum up the 
values Ld = Li + • • • + L/v- 

5) All the above computing and communication are done 
in in parallel. 

The communication burden is very low. The central ingre¬ 
dient of Algorithm 1 is to exploit the Central Limit Theorem 
of the used linear statistic L defined in By means of 
Theorem |III.1[ we have 

L = ^ ^ [ pp +X p (*°>*)>° 2 

i=1 \P / \ £ = l 



A. System Architecture and Signal Model 

The system architecture of the testbed is as Fig. [I] 


K Mobile Terminals 
(Emulated by 
K USRPs with SDR) 





#K 


/V 



#1 



Fig. 1: System Architecture of Multi-User Massive MIMO 
Testbed. 

The general software-defined radio (SDR) universal soft¬ 
ware radio peripheral (USRP) platform is used to emulate the 
base station antenna in our testbed. We deployed up to 70 
USRPs and 30 high performance PCs to work collaboratively 
as an large antenna array of the massive MIMO base station. 
These USRPs are well clock synchronized by an AD9523 
clock distribution board. The system design of this testbed 
can be found in [ 22 j. 

Our testbed has demonstrated initial capabilities as below: 
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A 


Fig. 2: Reciprocity mode for TDD channel 


a) Channel Reciprocity for Channel Measurement: 
Channel matrix measurement is a critical task for Multi-User 
Massive MIMO system. For the antenna i and j, if the uplink 
and downlink work in TDD mode, the channel reciprocity 
will be useful for the pre-coding in MIMO system. Channel 
reciprocity means hij = hjj if hij represents the air channel 
from antenna i to antenna j and vice versa. 

Given the h is the air channel between antenna i and j , the 
measured channel hij and hjj follow the model depicted as 
Fig. [2j where where T (i), R(j ), R(i), T (j) represent the 
effect from circuits like upper/down conversion, filters, etc., 
for both the upper and down links. 

Thus we have 

hi,j =T (i) ■ h ■ R (j) 

hij = R{i) ■ h-T (j) ' 1 

Usually, the relative calibration is sufficient for the pre¬ 
coding as we have 

Ifhj — T (i) ' R (j) /11 x 

h jti R(i) -T (j) 1 ; 

which is constant in ideal situation. 

Channel reciprocity described above includes the circuits 
impact. Our measurement shows that ratio hij/hji between 
the downlink and uplink channel frequency response for 
antenna i and j is almost constant. For example, we collect 
3 rounds of data within a time duration that the channel can 
be regarded as static. Thus 3 such ratios are obtained for a 
specified link between USPR node transmitting antenna 3 and 
receiving antenna 22. The absolute value of 3 ratios are 1.2486, 
1.22, 1.2351 respectively. 

b) Massive Data Acquisition for Mobile Users or Com¬ 
mercial Networks: Consider the time evolving model de¬ 
scribed as below: Let N be the number of antennas at base 
station. All the antennas start sensing at the same time. Every 
time, on each antenna, a time series of with samples length T 
is captured and denoted as Xi G C lxT , i = 1,..., N. Then a 
random matrix from N such vectors are formed as: 


*i = 


Xi 

X2 

X N 


( 12 ) 


NxT 


where, j = 1, • • • , L. Here L means we repeat the sensing 
procedure with L times. Then L such random matrices are 
obtained. In the following sections, we are interested in variant 
random matrix theoretical data models, including the product 
of the L random matrices and their geometric/arithmetic mean. 
We call it time evolving approach. 


Besides the time evolving approach, we can also use a 
different data format to form random matrix. Suppose we 
select n receivers at Massive MIMO base station. At each 
receiver, we collect NxT samples to get a random matrix 
Xi eC NxT with i = 1, • • • ,n. Similarly, we are interested 
in the functions of these random matrices. We call it space 
distributed approach. 

In the next section, we specify which approach is used to 
form the random matrix when a certain theoretical model is 
used. 

V. Random Matrix Theoretical Data Model and 
Experimental Validation 

We are interested in the eigenvalue distribution for every 
data model. The results obtained from the experimental data 
are compared with theoretical distribution (if exists). The 
experimental data come from noise-only case and signal- 
present case. Our testbed captures the commercial signal data 
at 869.5 MHz. 


A. Product of non-Hermitian random matrix 

The eigenvalue distribution for the product of non-Hermitian 
random matrix, so far, gives us the best visible information 
to differentiate the situations of noise only and signal present. 
Here the timing evolving approach is used. Denote the product 
of non-Hermitian random matrix as: 

L 

Z=H x i (13) 

j =0 

In the experiment, L is adjustable. In addition, a number 
of such Z are captured with time evolving, to investigate 
if the pattern is changing or not with time. Every Z could 
be regarded as one snapshot. For both the noise and signal 
experiment, we took 10 snapshots. All the 10 snapshots are 
put together to show eigenvalue distribution more clearly. 

1) Eigenvalue Distributions for Noise-Only and Signal- 
Present: Firstly, we visualize the eigenvalue distribution on 
the complex plane to see the difference for the cases of noise- 
only and signal-present. 

Noise Only: If the eigenvalue distribution for all the snap¬ 
shots are put together, we see Fig. [3j in which the red circle 
represents the “Ring Law”. 

Signal Present: If putting together the eigenvalues of all 
snapshots, we see Fig. [4j in which the inner radius of the 
eigenvaule distribution is smaller than that of the ring law. 

We also use the probability density diagram to show the 
difference between noise only and signal present cases, with 
different L. The theorem |V.1| actually gives the theoretical 
values of the inner radius and outer radius of the ring law. 


Theorem V.l. The empirical eigenvalue distribution of N xT 

L 

matrix converge almost surely to the same limit given 

i= 1 

by 



i=1 


2 

7 tcL 


\\\ 2/L ~ 2 

0 


(1 - c ) i/2 < r < 1 
elsewhere 
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N=68, T=170, c=N/T=0.4, L=5 



Fig. 3: The eigenvalue distribution for product of non- 
Hermitian random matrix, noise only, all snapshots. 



Fig. 4: The eigenvalue distribution for product of non- 
Hermitian random matrix, signal present, all snapshots. 



| lambda | 


Fig. 5: Probability of eigenvalue for product of the non- 
Hermitian random matrix, both cases, with L = 5. 



I lambdal 


as N, n oc with the ratio c = N/n ^ 1 fixed. 


We are interested in the probability density of 
r = |A|, which is described in Eq. 14 
Theorem IV.ll 


Let 

derived from the 


fj,. ( r ) = 


2 — _ 1 


n x, 

i = 1 


(1 — c) L/2 ^ r ^ 1 
elsewhere 


(14) 


The PDF is also shown in Fig. [5] and Fig. [ 6 ] with different 
L. 

The above results show that eigenvalue distribution follows 
the ring law in this model for noise only case. The signal 
present case also has the ring law while the inner radius is 
much smaller than the noise only case, especially when L is 
large. 

2 ) Empirical Effect of L to Differentiate Cases of Noise 
only and Signal Present: : 

Regarding the product of non-Hermitian random matrices, 


Fig. 6 : Probability of eigenvalue for product of the non- 
Hermitian random matrix, both cases, with L = 10. 


the main difference observed in cases of noise only and signal 
present, is about the inner circle radius of the eigenvalue 
distribution. 

According to the ring law, the inner circle radius of the 
eigenvalue distribution for the noise only case, is constrained 
by Eq. [15] which is a fixed value for a determined L and c. 


= (l-c) = 


(15) 


Meanwhile, the radius shrinks for the case of the signal being 
present. In addition, for both cases, the inner circle radius 
decreases with increasing L. The question is whether it is 
easier to differentiate the two cases when increasing the value 
LI 
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Fig. 7: Shrinking eigenvalue ratio within the ring law inner 
circle between the noise only and signal present cases. 


N=66, T=165. c=N/T=0.4. L=5 



Fig. 8: The eigenvalue distribution for geometric mean of non- 
Hermitian random matrix, noise only, all snapshots, L- 5. 


For the same L, we define M no i S e (L)\ r<r as the number 
of eigenvalues falling within the ring law inner circle, mea¬ 
sured for the noise only case, and the M no i S e (L)\ r<r as the 
number of eigenvalues falling within the inner circTe of the 
ring law, measured for signal present case. Thus, we have a 
ratio denoted as 


P(L) 


^noise (^) | r < r 

_ii 

^signal (-^)l r<r . 


( 16 ) 


to represent the impact of L. 

Fig. [7] show the trend of the ratio with increasing L. 
Generally, the ratio decreases with the increasing L , indicating 
that the larger L brings better distance between the cases 
of noise only and signal present. However, the trend is very 
similar with the negative exponential function of L. When L 
is greater than 10, the ratio does not change much. 


B. Geometric Mean 


Using the same data as last paragraph, the geometric mean 
of the non-Hermitian random matrix can be obtained as: 


Z = 



(17) 


Time evolving approach is used here. In this experiment, 
we adjust the L and the convergence is observed when L is 
increased. All the diagrams below include 10 snapshots of data 
results. Basically, in this case, the eigenvalues converge to the 
outer unit circle and are not changing much with increasing 
L. 


Noise Only: Fig. [8] to Fig. 10 show the eigenvalue distri¬ 
bution of the geometric mean for noise only situation. 

Signal Present: Fig. [IT] to Fig. [13] show the eigenvalue dis¬ 
tribution of the geometric mean for signal situation. Different 


N=66, T=165, c=N/T=0.4. L=20 



Fig. 9: The eigenvalue distribution for geometric mean of non- 
Hermitian random matrix, noise only, all snapshots, L- 20. 


with noise case, the convergence of the eigenvalue is sensitive 
to the value of L. With bigger L, the distribution converges 
more to the unit circle. 

We also show the PDF of the eigenvalue absolute values for 
geometric mean, in Fig. [14] and Fig. [T5| with different L. 

From all the visualized results for the Geometric mean 
model, we see that 

• the eigenvalue distribution is similar to the ring law, but 
the radius is not the same as product of non-Hermitian 
random matrices. 

• the difference between inner radius and the outer radius, 
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N=66. T=165, c=N/T=0.4. L=60 



Fig. 10: The eigenvalue distribution for geometric mean of 
non-Hermitian random matrix, noise only, all snapshots, L=60. 


N=66. T=165, c=N/T=0.4. L=5 



Fig. 11: The eigenvalue distribution for geometric mean of 
non-Hermitian random matrix, signal present, all snapshots, 
L= 5. 


for the signal-present case, is larger than that for noise- 
only case. 

• with L increased, the “ring” is converged more to the 
outer circle. The absolute difference between noise-only 
and signal-present is actually not get larger with increas- 
ing L. 


N=66. T=165, c=N/T=0.4, L=20 



Fig. 12: The eigenvalue distribution for geometric mean of 
non-Hermitian random matrix, signal present, all snapshots, 
L= 20. 


N=66. T=165. c=N/T=0.4. L=60 



Fig. 13: The eigenvalue distribution for geometric mean of 
non-Hermitian random matrix, signal present, all snapshots, 
L= 60. 



















Geometric Mean 
N=66, T=165, c=N/T=0.4, L=5 



Fig. 14: Probability of eigenvalue for geometric mean of the 
non-Hermitian random matrix, both cases, with L = 5. 


Geometric Mean 

N=66, T=165, c=N/T=0.4, L=60 



Fig. 15: Probability of eigenvalue for geometric mean of the 
non-Hermitian random matrix, both cases, with L = 60. 


C. Arithmetic Mean: 


N=66. T=165, c=N/T=0.4. L=5 



Fig. 16: The eigenvalue distribution for arithmetic mean of 
non-Hermitian random matrix, noise only, L- 5. 


N=66. T=165. c=N/T=0.4. L=20 



Fig. 17: The eigenvalue distribution for arithmetic mean of 
non-Hermitian random matrix, noise only, L- 20. 


The arithmetic mean of the non-Hermitian random matrix 
is defined as 


z =7 IE*. 


L 


(18) 


For both the noise-only and signal-present cases, we adjust the 
value of L to see the effect. We select L = 5, 20,100. 

Noise Only: Fig. |T6] to Fig. 18 show the eigenvalue distri¬ 
bution of the arithmetic mean of the L non-Hermitian random 
matrix, for the noise only case. 

Signal Present: Fig. [T9] to Fig. 21 show the eigenvalue 
distribution of the arithmetic mean of the L non-Hermitian 


random matrix, for the signal present case. 

The corresponding PDFs of the eigenvalue absolute values 


of arithmetic mean are also shown Fig. 22 and Fig. 23 


From the visualized results of the eigenvalue distribution 
for Arithmetic mean model, we see 


• The eigenvalue distribution for either noise-only and 
signal-present is following a similar ring law. 

• The width of the ring, for signal-present, is larger than 
that for noise-only. 

• We cannot get extra benefit by increasing L, as the width 
of the ring is not impacted by L. 
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N=66, T=165, c=N/T=0.4. L=100 



Fig. 18: The eigenvalue distribution for arithmetic mean of 
non-Hermitian random matrix, noise only, L- 100. 


N=66, T=165. c=N/T=0.4. L=5 



Fig. 19: The eigenvalue distribution for arithmetic mean of 
non-Hermitian random matrix, signal present, L- 5. 


N=66. T=165, c=N/T=0.4, L=20 

| ♦ Eigenvalues | 



* 1 - 5 1.5 -1 -0.5 0 0.5 1 1.5 

real(Z) 

Fig. 20: The eigenvalue distribution for arithmetic mean of 
non-Hermitian random matrix, signal present, L- 20. 


N=66, T=165, c=N/T=0.4, L=100 

| ♦ Eigenvalues | 



-1 5 - 1 - 1 - 1 - 1 -I- 

-1.5 -1 -0.5 0 0.5 1 1.5 

real(Z) 

Fig. 21: The eigenvalue distribution for arithmetic mean of 
non-Hermitian random matrix, signal present, L=100. 


D. Product of Random Ginibre Matrices 


We study the product of k independent random square 

k 

Ginibre matrices, Z = Y[ G^. When the random Ginibre mar- 

l 

tices, G i, are square, the eigenvalues of ZZ^ have asymptotic 
distribution (x) in the large matrix limit. In terms of 
free probability theory, it is the free multiplicative convolution 
product of k copies of the Marchenko-Pastur distribution. In 
this model, we applied the space distributed approach to for 
the random matrix. 


For k = 2, the spectral density is explicitly given by 

2 1 / 3 (27 + 3V81 - 12a;) 2/3 - 6a: 1 / 3 


o( 2 ) 


0*0 = 


2 1 / 3 v / 3 


12tt 


a; 2 / 3 (27 + 3\/81 - 12a;) 


1/3 


(19) 


where x E [0,27/4]. For general k, the explicit form of the 
distribution is a superposition of hyper-geometric function of 
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Arithmetic Mean 
N=66, T=165, c=N/T=0.4, L=5 




Fig. 24: Spectral density of eigenvalues for product of square 
Random Ginibre Matrices, k=2 


Fig. 22: Probability of eigenvalue for arithmetic mean of the 
non-Hermitian random matrix, both cases, with L = 5. 


Arithmetic Mean 


N=66. T=165, c=N/T=0.4, L=100 




Fig. 25: Spectral density of eigenvalues for product of square 
Random Ginibre Matrices, k=4 


where 0j = 1 - ^ bj = 1 + , and 


Fig. 23: Probability of eigenvalue for arithmetic mean of the 
non-Hermitian random matrix, both cases, with L = 100. 


the type k F k -1 



p (fc) (x) = ^2 1 

k Fh 1 


i =1 


{%}•=! ; 




where 


([MU ]; [{ & i>y ;*) 


geometric function of the type p F g . 


stands for the hyper- 


\ 

(* + 1 )" +1 y 

( 20 ) 


From the noise data captured by fc USRP sensors, we 
obtained the histogram for the spectral density of the product 
of the Ginibre random matrices. Fig. [24] to Fig. [26] show that 
the histograms match the theoretical pdf well, for different k. 
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Fig. 26: Spectral density of eigenvalues for product of square 
Random Ginibre Matrices, k=6 


E. Summary of Theoretical Validation by Experimental Data 

We applied variant data models on the massive data col¬ 
lected by our massive MIMO testbed. Firstly, we found that the 
theoretical eigenvalue distribution (if exists) can be validated 
by the experimental data for noise-only case. The random 
matrix based big data analytic model is successfully connected 
to the experiment. Secondly, the signal-present case can be 
differentiated from the noise-only case by applying the same 
data model. This result reveals the potential usage of the 
random-matrix based data model in signal detection, although 
the future work on the performance analysis is needed. 

VI. Initial Applications of Massive MIMO Testbed 
as Big Data System 

Besides signal detection, we demonstrated two applications 
based on the massive data analytic through the random-matrix 
method. The theoretical model in section |V-A| is used, i.e., we 
mainly apply the product of non-Hermitian random matrices 
on the collected mobile data to investigate the corresponding 
eigenvalue distribution. Our aim is to make sense of massive 
data to find the hidden correlation between the random-matrix- 
based statistics and the information. Once correlations between 
causes and effects are empirically established, one can start 
devising theoretical models to understand the mechanisms un¬ 
derlying such correlations, and use these models for prediction 
purposes (23). 

A. Mobile User Mobility Data 

In a typical scenario where the mobile user is communi¬ 
cating with the massive MIMO base station while moving, 
the uplink waveform data received at each receiving antenna 
are collected. We applied the product of Hermitian random 
matrices to the data to observe the relationship between the 
eigenvalue distribution and the behavior of the moving mobile 
user. We are using the data from 10 antennas associated with 
10 USRP receivers. Another USRP placed on a cart acts as 
the mobile user, which moves on the hallway of the 4th floor 


of the Clement Hall at Tennessee Technological University. 
The base station with up to 70 USRPs is on the same floor. 
The experiment results show that the moving speed of the 
mobile user is directly associated with the inner circle of 
the eigenvlaue distribution for the product of the Hermitian 
random matrices. 

The experiments include five cases with different the mov¬ 
ing speeds. 

a) Case 1: The Mobile User Stands in a Certain Place 
without Moving 

In this case, the mobile user has zero speed. What we 
observed in Figure 27 is that the inner radius of the circle 
is almost not changing. The average inner radius is a little 
less than 0.05 for the whole procedure. 



Fig. 27: Ring law inner radius changing with time for moving 
mobile user, case 1. 

b) Case 2: The Mobile User Moves at a Nearly 
Constant Walking Speed 

In this case, the mobile user moves along a straight line at a 
nearly constant walking speed from a distant point to a point 
near the base station. Figure [28] shows the change of the inner 
radius of the circle law with time. The moving mobile user is 
actually on a cart pushed by a man. We see the inner radius 
is much bigger at the beginning when the cart is accelerating 
from almost motionless to a walking speed than the rest of 
the time. During the moving stage, the inner radius is much 
smaller and very stable at around 0.005. 



Fig. 28: Ring law inner radius changing with time for moving 
mobile user, case 2. 
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c) Case 3: The Mobile User Moves at a Very slow 
speed 



Fig. 29: Ring law inner radius changing with time for moving 
mobile user, case 3. 


In this case, we move the mobile user at a very slow speed 
that is much smaller than walking speed. We see in Figure [29] 
that the inner radius is mostly vacillating between 0.02 and 
0.05. This value is much smaller than that of the stationary 
case, but bigger than the walking-speed case. 

d) Case 4: The Mobile User Moves at Varying speed: 
Half the Time walking, Half the Time at a Very Slow 
Speed. 



Fig. 30: Ring law inner radius changing with time for moving 
mobile user, case 4. 


In this case, we try to observe the difference for the impacts 
from different moving speeds on the inner radius in one figure. 
Figure [30] shows that the radius in the first half is much smaller 
than that in the second half. Correspondingly, the moving 
speed in the first half is much higher than the latter half. 

e) Case 5: The Mobile User Moves at Varying Speed: 
Half the Distance Walking, Half the Distance at a Very 
Slow Speed. 

Similar to case 4, the impacts from different speeds are 
observed in the figure. A higher moving speed brings a 
smaller inner radius of the eigenvalue distribution. Because 
the walking speed part has equal distance with the slow speed 
part, the occupied time of the former is smaller than the later 
part, just as shown in Figure [31] 



Fig. 31: Ring law inner radius changing with time for moving 
mobile user, case 5. 


All the above cases reveal a common observation that the 
faster the mobile user moves, the smaller the inner radius of 
ring law is. From the big data point of view, we can get 
insight that a massive MIMO based station can use the inner 
radius of the ring law to estimate the moving status of the 
mobile user. As we know, basically more correlation in the 
signal brings a smaller inner radius of the ring law. Thus, this 
result is reasonable, as the faster speed of the mobile user 
causes more Doppler effect to the random signal received in 
the massive MIMO base station, i.e., more correlation detected 
by the product of the Hermitain random matrices. 

B. Correlation Residing in Source Signal 

Besides the correlation introduced by the moving environ¬ 
ment, as in the above experiment, the correlation residing in 
the transmitting signal also has a significant impact on the 
eigenvalue distribution of the random matrix. Actually, in the 
section on theoretical model validation, we only compared the 
cases of noise-only and signal-present. The correlation within 
the signal creates the derivation of the eigenvalue distribution. 
In this section, we intentionally adjust the auto-correlation 
level of the generated signal that is transmitted by the mobile 
user. The corresponding effect on the inner radius of the ring 
law is also investigated by analyzing the collected data from 
antennas at the massive MIMO base station. 

We generate the output signal following Eq. [22] 

y (n) = (1 + r) x (ri) + ry (n - 1) (22) 

which can also be represented by Figure [32] In the experi- 



D 


IHfl 

Fig. 32: Auto-regression filter used to generate the signal with 
adjustable autocorelation. 

ment, x (n) is set as Gaussian white noise. 
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Essential to this signal generator is an auto-regression filter 
in which the parameter r is used to control the frequency 
response as shown in Figure [33] A bigger r leads a sharper 


Magnitude Response (dB) 



Fig. 33: Bigger r leads to sharper frequency response of the 
AR filter for signal generator. 

frequency response that introduces more correlation within the 
transmitted signal. Thus, we can see that the inner radius of 
the ring law observed at the massive MIMO base station is as 
in Figure [34] 



0.3 0.4 0.5 0.6 0.7 

r (correlation increased with r) 


Fig. 34: Inner radius of ring law changes with the r, bigger r 
leads smaller inner radius. 


C. Insights from Applications 

Both the applications bring us insights that the correlation 
residing in the signal can be matched to certain events in 
the network. In the network under our monitoring, such 
correlations can be detected and measured by our random- 
matrix-based data analysis method and finally be used to 
visualize the real event, such as the mobile user moving, 
fluctuation of the source signal correlation. This is a typical 
big data approach. The massive MIMO system is not only 
a communications system but also an expanded data science 
platform. We make sense of data by storing and processing 
the massive waveform data. Information will not be discarded, 
thus the energy of every bit/sample can be utilized as possible 
as we can. To our best knowledge, it is the first time, by 
concrete experiments, to reveal the value of the 5G massive 


MIMO as a big data system. We believe that more applications 
emerge in the future. 


VII. Conclusion 

The paper gives a first account for the 70-node testbed that 
takes TTU four years to develop. Rather than focusing on 
the details of the testbed hardware development, we use the 
testbed as a platform to collect massive datasets. The motivated 
application of this paper is massive MIMO. First, by using our 
initial experimental data, we find that large random matrices 
are natural models for the data arising from this tested. Second, 
the recently developed non-Hermitian free probability theory 
makes the theoretical predictions very accurately, compared 
with our experimental results. This observation may be central 
to our paper. Third, the visualization of the datasets are 
provided by the empirical eigenvalue distributions on the 
complex plane. Anomaly detection can be obtained through 
visualization. Fourth, when no visualization is required, we can 
formulate spectrum sensing or network monitoring in terms 
of matrix hypothesis testing. This formulation is relatively 
new in our problem at hand for massive MIMO. To our best 
knowledge, our work may be the first time. A new algorithm 
is proposed for distributed data across a number of servers. 

At this moment of writing fT0| , we feel that both theoretical 
understanding and experimental work allows for extension 
to other applications. First, thousands of vehicles need be 
connected. Due to mobility, streaming data that are spatially 
distributed across N = 1,000 becomes essential. We have 
dealt with hypothesis testing problem. How do we reduce the 
data size while retaining the statistical information in the data? 
Sketching |[24| is essential (TO) . Second, the testbed allows for 
the study of data analytical tools that will find applications in 
large-scale power grid, or Smart Grid |9|. For example, the 
empirical eigenvalue distribution of large random matrices is 
used for power grid in j25). 
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